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Journal  Publications: 

a.  “Optical  Computer  Recognition  of  Facial  Expressions  Associated  with 
Stress  Induced  by  Performance  Demands”.  D.  Dinges,  R.  Rider,  J. 
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Analysis  of  Suspicious  Behavior  from  Human  Gestures  and  Movement, 
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Vogler  and  D.  Metaxas.  Procs.  CVPR  2004. 

h.  Human  Gait  Recognition.  R.  Zhang,  C.  Vogler  and  D.  Metaxas.  Procs 
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2004).  June  2004. 
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Tracking.  X.  Huang,  S.  Zhang,  Y.  Wang,  D.  Metaxas  and  D.  Samaras. 
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2.  Interactions/Transitions: 

a.  We  have  presented  our  work  in  all  major  conferences  and  related  meetings 
(CVPR,  ICCV),  as  well  as  the  annual  progress  report  meeting  by  Dr 
Herklotz.  In  addition  I  have  given  several  distinguished  lectures  on  this 
work  at  major  Universities  such  as:  Columbia  University,  Harvard  and 
MIT. 

b.  Consultative  and  advisory  functions  to  other  laboratories  and  agencies:  I 
am  currently  actively  working  with  UPENN  and  Univ  of  Arizona  on 
security  issues. 

c.  Transitions.  Our  aim  is  to  continue  the  development  of  this  work  and 
deploy  it  to  airports,  buildings,  airplanes,  embassies  as  well  as  in  prisons. 
We  are  in  the  process  of  making  contact  to  transition  some  of  this 
technology  for  security  purposes  and  possible  deployment  in  Iraq  and 
other  security  installations. 

3.  New  discoveries,  inventions,  or  patent  disclosures:  For  the  first  time  we  have 
shown  that  it  is  possible  to  detect  computationally  from  visual  input  normal  from 


abnormal  gait  as  well  as  be  able  to  focus  on  a  person’s  gestures  and  facial 
expressions.  We  intend  to  patent  some  of  our  findings  in  the  near  future. 

a.  Human  body  tracking  and  activity  recognition :  (i)  At  a  medium  scale  with 
cameras  that  can  see  details  of  people,  we  observe  human  kinesic 
behaviors  (postures,  head  and  trunk  movements,  and  gestures)  and 
proxemic  behaviors  (distancing  and  approach-avoidance).  Since  we  are 
able  to  extract  detailed  body  information  in  four  dimensions  (space  and 
time)  of  people  existing  in  a  specific  room  and  in  a  close  range  from  the 
camera,  we  will  be  able  to  detect  suspicious  behaviors.  To  achieve  this 
goal,  we  will  need  predefined  criteria  and  features  given  from  experts  in 
the  field  of  behavioral  analysis,  (ii)  At  a  larger  scale  with  broad-scope 
cameras,  we  can  observe  the  whole-body  motions  of  people  within  the 
surveillance  area  and  track  them  using  difference  images  with  background 
and  foreground  modeling.  Tracking  multiple  people  enables  us  to  detect 
unusual  paths,  velocities,  and  accelerations  as  clues  to  deception  and  in 
this  way  we  will  be  able  to  build  a  model  of  human  path  selection,  based 
on  the  movement  style  and  content.  Figure  1  shows  two  frames  of  a 
specific  person  being  tracked  in  a  crowd  for  human  activity  recognition. 

b.  Face  tracking  and  emotional  state  recognition  based  on  facial  features : 
We  have  developed  a  fast  and  robust  face  detection  technique  for 
extracting  faces  from  moving  and/or  cluttered  backgrounds  and  under 
different  lighting  conditions.  This  is  the  first  step,  crucial  for  facial 
analysis.  After  a  face  region  is  detected  in  an  image  (the  first  frame  of  a 
short  sequence),  we  have  implemented  a  fusion  between  Kalman  filter- 
based  methods  and  ASM  (Active  Shape  Models)  to  extract  in  2D  the  facial 
features,  while  the  individual  is  speaking  or  making  facial  expressions. 
Along  with  the  2D  facial  features  extraction,  the  estimated  facial  points 
are  used  as  input  to  our  model-based  3D  face  tracker.  The  key 
contributions  of  the  extracted  3D  information  are  the  automatic  estimation 
of  the  exact  head  pose  and  the  relative  depth  of  the  estimated  features.  In 
this  way,  we  overcome  the  head  pose  limitations  commonly  met  in 
existing  applications.  The  3D  face  tracking  method  provides  robustness 
under  strong  head  rotations/movements  and  occlusions.  Moreover,  the 
utilized  deformable  face  model  gives  us  information  about  3D  distances 
between  specific  facial  points  (features)  and  their  motion  (deformation) 
over  time.  Being  able  to  extract  the  3D  facial  information  close  to  real¬ 
time,  we  can  model  our  results  in  any  format  appropriate  for  facial 
expression  recognition,  emotional  state  recognition  from  facial  features, 
and/or  person  identification  from  facial  features.  Figure  2  illustrates  the 
facial  features  detection  and  tracking  in  2D  as  well  as  the  3D  face  tracking 
result  for  an  input  frame. 

C.  Head  and  hands  blob-based  tracking  and  emotional  state  recognition :  The 
first  important  step  towards  gesture  analysis  and  emotional  state 
recognition/deception  detection  from  visual  input  is  to  detect  and  track  the 
body  parts  of  our  interest,  i.e.  the  head  and  hands.  Although  research 
efforts  have  investigated  this  issue  in  the  recognition  level,  accurate  and 


real-time  tracking  of  people  and  their  body  parts  is  still  an  open  topic  in 
the  computer  science  community.  Using  color  analysis,  eigenspace-based 
shape  segmentation,  and  Kalman  filters,  we  have  been  able  to  track  the 
position,  size,  and  angle  of  different  body  parts  with  great  accuracy  and 
high  rates  close  to  50  fps.  Blob  analysis  extracts  hand  and  face  regions 
using  the  color  distribution  from  an  image  sequence  using  prior 
knowledge  from  a  skin  color  database.  From  the  blobs,  the  left  hand,  right 
hand  and  face  are  tracked  continuously,  i.e.  using  the  motion  information 
over  time.  Also,  from  positions  and  movements  of  the  hands  and  face  we 
can  make  further  inferences  about  the  torso  and  the  relation  of  each  body 
part  to  other  people  and  objects.  Tracking  the  hands  and  the  head,  we  are 
able  to  extract  movement  signatures,  positions,  velocities,  accelerations 
and  relative  positions  between  the  extracted  blobs.  We  are  able  to 
automatically  detect  events  such  as  when  two  hands  come  together  and 
when  a  hand  touches  the  face,  and  estimate  these  events’  frequency  and 
duration.  For  the  recognition  of  two  basic  emotional  states,  i.e.  “over¬ 
controlled”  and  “relaxed”,  we  implemented  a  hierarchical  recognition 
scheme,  based  on  Hidden  Markov  Models,  that  uses  as  input  visual  cues 
and  extracts  a  conclusion  about  the  individuals  behavior.  For  the 
construction  of  this  recognition  scheme,  we  used  training  and  testing  data 
including  actors  and  real  interrogation  scenarios  with  ground  truth.  Figure 
3  shows  the  blob  extraction  for  the  head  and  hands  as  well  as  shoulder 
detection  results  in  an  input  frame. 


Figure  1:  Multiple  people  tracking:  selecting  and  tracking  a  specific  individual  in  a 

crowd 


(a)  (b) 

Figure  2:  (a)  facial  features  extraction  after  the  face  is  detected,  (b)  3D  face  model 

fitting  from  2D  feature  extraction. 


Figure  3:  Head,  hands  and  shoulders  extraction 
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14.  ABSTRACT  We  have  developed  a  robust  framework  for  analyzing  human  motion,  for  the  purposes 
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the  algorithms  for  integrating  the  data  should  be  general  enough  to  allow  contributions  from 
nonvisual  sources,  such  as  speech.  This  framework  has  immediate  applications  in  the  areas  of 
surveillance  and  interrogation.  In  surveillance,  we  can  detect  intruders  through  people 
identification.  In  interrogation,  we  provide  an  invaluable  backup  for  human  interrogators  and 
psychologists  by  picking  up  subtle  behavioral  cues  from  human  motion  that  an  interrogator 
might  miss.  By  recognizing  these  cues,  we  can  offer  valuable  cues  to  interrogators.  Our 
system  opens  the  way  for  the  quantitative  analysis  of  human  communication  in  .general. 
Collaborations:  University  of  Arizona  and  UPENN  on  many  of  the  above  applications. 
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